Covariate Assisted Variable Ranking
نویسندگان
چکیده
Consider a linear model y = Xβ + z, z ∼ N(0, σIn). The Gram matrix Θ = 1 n X ′X is non-sparse, but it is approximately the sum of two components, a low-rank matrix and a sparse matrix, where neither component is known to us. We are interested in the Rare/Weak signal setting where all but a small fraction of the entries of β are nonzero, and the nonzero entries are relatively small individually. The goal is to rank the variables in a way so as to maximize the area under the ROC curve. We propose Factor-adjusted Covariate Assisted Ranking (FA-CAR) as a two-step approach to variable ranking. In the FA-step, we use PCA to reduce the linear model to a new one where the Gram matrix is approximately sparse. In the CAR-step, we rank variables by exploiting the local covariate structures. FA-CAR is easy to use and computationally fast, and it is effective in resolving signal cancellation, a challenge we face in regression models. FA-CAR is related to the recent idea of Covariate Assisted Screening and Estimation (CASE), but two methods are for different goals and are thus very different. We compare the ROC curve of FA-CAR with some other ranking ideas on numerical experiments, and show that FA-CAR has several advantages. Using a Rare/Weak signal model, we derive the convergence rate of the minimum sure-screening model size of FA-CAR. Our theoretical analysis contains several new ingredients, especially a new perturbation bound for PCA.
منابع مشابه
A Simple Bayesian Algorithm for Feature Ranking in High Dimensional Regression Problems
Variable selection or feature ranking is a problem of fundamental importance in modern scientific research where data sets comprising hundreds of thousands of potential predictor features and only a few hundred samples are not uncommon. This paper introduces a novel Bayesian algorithm for feature ranking (BFR) which does not require any user specified parameters. The BFR algorithm is very gener...
متن کاملRanked Set Sampling Based on Binary Water Quality Data with Covariates
A ranked set sample (RSS) is composed of independent order statistics, formed by collecting and ordering independent subsamples, then measuring only one item from each subsample. If the cost of sampling is dominated by data measurementrather than collection or ranking, the RSS technique is known to be superior to ordinary sampling. Experiments based on binary data are not designed to exploit th...
متن کاملSwiss-System Based Cascade Ranking for Gait-Based Person Re-Identification
Human gait has been shown to be an efficient biometric measure for person identification at a distance. However, it often needs different gait features to handle various covariate conditions including viewing angles, walking speed, carrying an object and wearing different types of shoes. In order to improve the robustness of gait-based person re-identification on such multi-covariate conditions...
متن کاملEstimating and Modelling Bias of the Hierarchical Partitioning Public-Domain Software: Implications in Environmental Management and Conservation
BACKGROUND Hierarchical partitioning (HP) is an analytical method of multiple regression that identifies the most likely causal factors while alleviating multicollinearity problems. Its use is increasing in ecology and conservation by its usefulness for complementing multiple regression analysis. A public-domain software "hier.part package" has been developed for running HP in R software. Its a...
متن کاملCovariate order tests for covariate effect.
A new approach for constructing tests for association between a random right censored life time variable and a covariate is proposed. The basic idea is to first arrange the observations in increasing order of the covariate and then base the test on a certain point process defined by the observation times. Tests constructed by this approach are robust against outliers in the covariate values or ...
متن کامل